A Deterministic Polynomial-time Approximation 
Scheme for Counting Knapsack Solutions 

Daniel Stefankovic* Santosh Vempala^ Eric Vigoda^ 

August 11, 2010 



Abstract 

Given n elements with nonnegative integer weights W\ , . . . , w n and 
an integer capacity C, we consider the counting version of the classic 
knapsack problem: find the number of distinct subsets whose weights 
add up to at most the given capacity. We give a deterministic algo- 
rithm that estimates the number of solutions to within relative error 
1 ± e in time polynomial in n and 1 / e (fully polynomial approximation 
scheme). More precisely, our algorithm takes time 0(n 3 e _1 log(n/e)). 
Our algorithm is based on dynamic programming. Previously, ran- 
domized polynomial time approximation schemes were known first by 
Morris and Sinclair via Markov chain Monte Carlo techniques, and 
subsequently by Dyer via dynamic programming and rejection sam- 
pling. 



1 Introduction 

Randomized algorithms are usually simpler and faster than their determin- 
istic counterparts. In spite of this, it is widely believed that P=BPP (see, 
e.g., [2]), i.e., at least up to polynomial complexity, randomness is not es- 
sential. This conjecture is supported by the fact that there are relatively 
few problems for which exact randomized polynomial-time algorithms exist 
but deterministic ones are not known. Notable among them is the problem 
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of testing whether a polynomial is identically zero (a special case of this, 
primality testing was open for decades but a deterministic algorithm is now 
known, [I]). 

However, when one moves to approximation algorithms, there are many 
more such examples. The entire field of approximate counting is based 
on Markov chain Monte Carlo (MCMC) sampling [11], a technique that 
is inherently randomized, and has had remarkable success. The problems 
of counting matchings [9j [12], colorings [8], various tilings, partitions and 
arrangements [14] . estimating partition functions |10|. 116] , or volumes [H 
[T3] are all solved by first designing a random sampling method and then 
reducing counting to repeated sampling. In all these cases, when the input 
is presented explicitly, it is conceivable that deterministic polynomial-time 
algorithms exist 

The one notable example of a deterministic approximate counting algo- 
rithm is Weitz's algorithm [17] for counting independent sets weighted by 
an activity A for graphs of maximum degree A when A is constant and 
A < A U (A) where A U (A) is the uniqueness threshold for the A-regular tree. 
This was later extended to counting all matchings of bounded degree graphs 
[3]. An alternative deterministic approach of Bandyopadhyay and Gamarnik 
[3] for colorings and independent sets of bounded degree graphs only approx- 
imates the logarithm of the size of the feasible set. The results of \17\ 14"] are 
the only two examples of an FPAS (fully polynomial approximation scheme) 
for a #P-complete problem that we are aware of. One limitation of both of 
these results is that the running time is quite large, in particular, the ex- 
ponent depends on In A. In contrast, our algorithm has a small polynomial 
running time. 

Here we consider one of the most basic counting problems, namely ap- 
proximately counting the number of 0/1 knapsack solutions. More precisely, 
we are given a list of nonnegative integer weights wi , . . . , w n and an integer 
capacity C, H and wish to count the number of subsets of the weights that 
add up to at most C. This decision version of this problem is NP-hard, 
but has a well-known pseudo-polynomial algorithm based on dynamic pro- 
gramming. For any e > 0, we give a deterministic algorithm that estimates 
the number of solutions to within relative error e in time polynomial in n 
and 1/e. 

Our result follows a line of work in the literature. Dyer et al. [7] gave a 

1 Volume computation has an exponential lower bound for deterministic algorithms, 
but that is due to the more general oracle model in which the input is presented. 

2 Our results extend to real- valued inputs, but we do not consider that here to avoid 
the issue of the model of computation. 
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randomized subexponential time algorithm for this problem, based on near- 
uniform sampling of feasible solutions by a random walk. Morris and Sinclair 
|15j improved this, showing a rapidly mixing Markov chain, and obtained an 
FPRAS (fully polynomial randomized approximation scheme). The proof 
of convergence of the Markov chain is based on the technique of canonical 
paths and a notion of balanced permutations introduced in their analysis. 
In a surprising development, Dyer [5], gave a completely different approach, 
combining dynamic programming with simple rejection sampling to also 
obtain an FPRAS. Although much simpler, randomization still appears to 
be essential in his approach — without the sampling part, his algorithm only 
gives a factor n approximation. 

Our algorithm is also based on dynamic programming, and similar to 
Dyer, is inspired by the pseudo-polynomial algorithm for the decision/optimization 
version of the knapsack problem. The complexity of the latter algorithm is 
0(nC), where C is the capacity bound. A similar complexity can be achieved 
for the counting problem as well using the following recurrence: 

S(i,j) = S(i - l,j) + S(i - Wi) 

with appropriate initial conditions. Here S(i,j) is the number of knapsack 
solutions that use a subset of the items {1, . . . , i} and their weights sum to 
at most j. 

Roughly speaking, since we are only interested in approximate counting, 
Dyer's idea was the following: scale down the capacity to a polynomial in n, 
scale down the weights by the same factor and round down the new weights, 
and then count the solutions to the new problem efficiently using the pseudo- 
polynomial time dynamic programming algorithm. The new problem could 
have more solutions (since we rounded down) but Dyer showed it has at 
most a factor of n more for a suitable choice of scaling. Further, given the 
exact counting algorithm for the new problem, one gets an efficient sampler, 
then uses rejection sampling to only sample solutions to the original problem. 
The sampler leads to a counting algorithm using standard techniques. Dyer's 
algorithm has running time 0(n 3 + e _2 n 2 ) using the above approach, and 
0(n 2 ' 5 y / log(e _1 ) + n 2 e~ 2 ) using a more sophisticated approach that also 
utilizes randomized rounding. 

To remove the use of randomness, one might attempt to use a more 
coarse-grained dynamic program, namely rather than consider all integer 
capacities 1,2, ... ,C, what if we only consider weights that go up in some 
geometric series? This would allow us to reduce the table size to n log C 
rather than nC. The problem is that varying the capacity even by an ex- 
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ponentially small factor (1 + n/2 n ) can change the number of solutions by 
a constant factor! 

Instead, we index the table by the prefix of items allowed and the num- 
ber of solutions, with the entry in the table being the minimum capacity 
that allows these indices to be feasible. We can now consider approximate 
numbers of solutions and obtain a small table. 

Our main result is the following: 

Theorem 1.1. Let w±, . . . ,w n and C be an instance of a knapsack prob- 
lem. Let Z be the number of solutions of the knapsack problem. There 
is a deterministic algorithm which for any e € (0, 1) outputs Z' such that 
(1 — e)Z < Z' < Z . The algorithm runs in time 0(n s e log(n/e)). 

The running time of our algorithm is competitive with that of Dyer. 
One interesting improvement is the dependence on e. Our algorithm has 
a linear dependence on (ignoring the logarithm term), whereas Monte 
Carlo approaches, including Dyer's algorithm [5] and earlier algorithms for 
this problem [15[ 17], have running time which depends on e~ 2 . 

2 Algorithm 

In this section we present our dynamic programming algorithm. Fix an 
knapsack instance and fix an ordering on the elements and their weights. 

We begin by defining the function r : {0, . . . , n} x M>o — > R U {±00} 
where r(i, a) is the smallest C such that there exist at least a solutions to 
the knapsack problem with weights w\, . . . , Wi and capacity C. We can not 
compute the function r efficiently since the second argument ranges over all 
real numbers. It will be used in the analysis and it is useful for motivating 
the definition of our algorithm. 

Note that, by definition, r(i,a) is monotone in a, that is, 





The value of r is easy to compute for i = 0: 




00 if a = 0, 
if0<a<l, 
00 otherwise. 




Note that the number of knapsack solutions satisfies: 



Z = maxja : r(n, a) < C}. 
We will show that r(i,a) satisfies the following recurrence. 



(3) 
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Lemma 2.1. For any i € [n] and any a £ M>o we have 

tu,o = mm max < ). „ , ' . x 4) 
V 1 « 6 [0,l] 1 r(» - 1, (1 - a)o) + tOi. w 

We defer the proof of the above lemma to Section [3j 

Now we move to an approximation of r that we can compute efficiently. 
We define a function T which only considers a small set of values a for 
the second argument in the function r, these values will form a geometric 
progression. 

Let 

Q:=l + ^- 
n + 1 

and let 

s := [nlog Q 2]. 

The function T : {0, . . . , n} x {0, . . . , s} — > R>o U {oo} is defined using the 
recurrence ([!]) that the function r satisfies. Namely, T is defined by the 
following recurrence: 



T 

T\i, j] = min max < _, 
L J «e[0,i] I T 



i 



1, [j + ln Q a\), 

l,Li + ln Q (l-a)J] + 



More precisely, T is defined by the following algorithm CountKnap- 
sack. 
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CountKnapsack 

Input: Integers w±, W2, ■ ■ ■ , w n , C and e > 0. 

1. Set T[0,0] = and T[0,j] =00 for j > 0. 

2 . Set Q = (1 + e/(n + 1)) and s = [n log Q 2] . 

3. For i = 1 — > n, for j = — > s, set 



r 

T[i, 7I = min max < _ 
ae[0,l] I T 



!, Li + m Q «J]> 

1 ; Li + m <2(! - «)J] 



(5) 



where, by convention, T[i — 1, k] = for A; < 0. 

4. Let 

/ := max{j : T[n, j] < C}. 

5. Output Z' := Q j ' +1 . 



The minimum in the recurrence ([!]), although formally over the entire 
interval [0, 1], only needs to be evaluated at the discrete subset where the 
second argument goes to the next integer. Hence we will be able to compute 
T efficiently. 

The key fact is that T approximates r in the following sense. 

Lemma 2.2. Leti>l. Assume that for all j € {0, s} we have that T[i— 
1, j] satisfy ([6]). Then for all j 6 {0, ... ,s} we have that T[i,j] computed 
using ([5]) satisfies: 

T(i,<y- i )<T[i,j}<r(i,Q : >). (6) 
We defer the proof of Lemma 12.21 to Section [3l 

We can now prove that the output Z' of the algorithm CountKnapsack 
is a (lie) multiplicative approximation of Z. 

Note that Z' is never an underestimate of Z, since, 

C<T[n,j' + l] <r(n,Q^" +1 ), 

that is, there are at most Q^' +1 solutions. We also have 

T(n,Qi'- n )<T[n,j']<C, 
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that is, there are at least Qi n solutions. Hence 

— < = Q n+1 < e £ . 

Z ~ Q3 ~ n 

This proves that the output Z' of the algorithm CountKnapsack sat- 
isfies the conclusion of Theorem 11.11 It remains to show that the algorithm 
can be modified to achieve the claimed running time. 



2.1 Running Time 

As noted earlier, the minimum in the recurrence ([5]) only needs to be eval- 
uated at the discrete subset S where the second argument goes to the next 
integer. For j e {0, 1, ... , s}, the set S is S = Si U S2 where: 

Si = {Q~ j , . . . , Q } and S 2 = {1 - Q°, . . . , 1 - Q-i}. 

Thus, T[i,j] can be computed in O(s) time. Since there are 0(ns) entries 
of the table and s = 0(n 2 /e) the algorithm CountKnapsack can be im- 
plemented in 0(ns 2 ) = 0(n 5 /e 2 ) time. 

To improve the running time, recall that r(i, a) is a non-decreasing func- 
tion in a. Similarly, it is easy to see by induction that T[i,j] is a non- 
decreasing function in j. Hence, in ([5]), the first argument in the maximum 
(namely, T\i — 1, [j + InQ aj] ) is non-decreasing in a. Similarly, the second 
argument in the maximum is a non-increasing function in a. Hence the min- 
imum of the maximum of the two arguments occurs either at the boundary 
(that is, for a G {0, 1}) or for a € (0, 1) where the derivative changes from 
negative to positive, that is a such that for j3 < a 

T[i-l,\j + hiQP\] <T[i-l,Li + ln Q (l-/3)J] +w u 

and for (3 > a 

T[i-l,[j + ln Q p\] >T[*-l,Lj + ln Q (l-/3)J] + Wi . 

Therefore, if we had the set S in sorted order, we can find the a that achieves 
the minimum in ([5]) using binary search in 0(log s) time. We do not have S 
in sorted order, but we do have Si and $2 in sorted order. We can instead do 
binary search over Si to find the a G Si that achieves the minimum over that 
set, and then over S2, and finally compare the two values. Therefore, step 3 
of the algorithm CountKnapsack to compute T[i, j] can be implemented in 
O(logs) time, and the entire algorithm then takes 0(n 3 e _1 log(n/e)) time. 
This completes the proof of the running time claimed in Theorem 11.11 
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3 Proofs of Lemmas 



Here we present the proofs of the earlier lemmas. 

We begin with the proof of Lemma [2.11 which presents the recurrence for 
the function r(i,a). 

Proof of Lemma \2. 1[ Fix any a 6 [0,1]. Let B = max{r(z — l,cra),T(i — 
1, (1 — a)a) +u>i}. There exist at least era solutions with weights w\, . . . , Wi-\ 
and capacity B >r(i — 1, era). There exist at least (1 — a)a solutions with 
weights w\, . . . , Wi-i and capacity B — Wi > r(i — 1, (1 — a)a). Hence there 
exist at least a solutions with weights w\, . . . , Wi and capacity B and thus 
r(i,a) < B. To see that we did not double count, note that the first type of 
solutions (of which there are at least era) has x% = and the second type of 
solutions (of which there are at least (1 — a)a) has X{ = 1. 
We established 

/ . . r(i — 1, era), 

T[i,a)< mm max < }■-.,-. ^ \ . (7) 
ae[o,i] ] t[i- 1,(1- a)a) +Wi. 

Consider the solution of the knapsack problem with weights wi, . . . ,w% 
and capacity C = r(i,a) that has at least a solutions. Let /3 be the fraction 
of the solutions that do not include item i. Then r(i — 1, /3a) < C, r(i, (1 — 
j3)a) < C — Wi, and hence 

max{r(i — 1, f3a),r(i, (1 — f3)a) + Wi} < C = r(i, a). 

We established 

c \ ^ ■ ) T ( i ~ 1 i aa )^ fQ\ 

T{i,a) > mm max )■ wi \ \ , ( 8 ) 
ae[0,l] I t(i - 1,(1 - a)a) + wi. 

Equations © and ^ yield g]). □ 
We now prove Lemma 12.21 that the function T approximates r. 

Proof of Lemma \2.SX By the assumption of the lemma and ([T|) we have 
T[i-l, \j + \n Q a\] >r(*-l,QL7' +In « a J-( < - 1 ))>r(*-l,aQ J '- < ). (9) 

and 

T[i - 1, Li + ln Q (l - a)J] > t (i- i,Qb'+in Q (i- a )j-(i-i)^ 

>r(i-l,(l-a)Q J - 4 ). (10) 



S 



Combining ([9]) and (jlOp with min and max operators we obtain 

i - h U + m Q oi\], 

+ ln Q (l-a)\] + 





\t 


min max < 




\ae[o,i] 





> 



mm max { ). ' T 6 , = t{i,Q 3 l ), 

Ue[o,i] | - 1, (1 - a)Q 3 l ) + Wi J 

establishing that T[i,j] computed using ([5]) satisfy the lower bound in j6]). 
By the assumption of the lemma and ([1]) we have 

T[i-l,[j + ln Q a\] <T(i-l,Qti+^Q a i)< T (i-l,aQi). (11) 

and 

T[i-l,[j+ln Q (l-a)\] <r(f-l,QLi+inQ(i-a)j) < r(f _ 1)(1 _ a)gJ) . (12) 
Combining (jlip and (|12p with min and max operators we obtain 





\t 


min max < 







l 



1) Li + m Q «J], 

l,Li + ln Q (l-a)J] + 



< 



\ae[o,i] \^ t(i- 1,(1- a)Qi) + Wi J 

establishing that T[i,j] computed using ([5]) satisfy the upper bound in ([6]). 

□ 
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